Comparison of proteome between human exocrine pancreatic tissue left over from an islet preparation and cultured human ductal organoids. Samples were send to the Proteomics core by Christos. 4 technical replicates for the Organoids and 4 for the human prep tissue.
Are the missing values missing randomly dispersed through the data, or is missingness more of a feature because the different samples acquire different trajectories and hence different proteomes. Here, it is more MNAR (missing not at random), a lot of proteins are either present in Organoids or in Tissue.
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:MSnbase':
combine
The following object is masked from 'package:Biobase':
combine
The following object is masked from 'package:BiocGenerics':
combine
The following object is masked from 'package:dplyr':
combine
The following object is masked from 'package:ggplot2':
margin
Loading required package: foreach
Attaching package: 'foreach'
The following objects are masked from 'package:purrr':
accumulate, when
Loading required package: rngtools
Imputing along margin 2 (samples/columns).
Code
NAs <-is.na(assay(norm_pg_sample)) ## the imputed values by different methods.imps <-list("GSimp"= imp_pg_GSimp, "QRILC"= imp_pg_QRILC, "MinProb"= imp_pg_MinProb, "RF"= imp_pg_RF, "knn"= imp_pg_knn) %>%lapply(function(se){ x =assay(se) %>% data.frame %>%gather("label", "value") %>%left_join(colData(se)[c("label","condition")],copy = T) %>% magrittr::extract(as.vector(NAs),) }) %>% data.table::rbindlist(idcol ="method")
Joining with `by = join_by(label)`
Joining with `by = join_by(label)`
Joining with `by = join_by(label)`
Joining with `by = join_by(label)`
Joining with `by = join_by(label)`
Code
## the original normalized values without imputationnonimps <-assay(norm_pg_sample) %>% data.frame %>%gather("label", "value") %>%left_join(colData(norm_pg_sample)[c("label","condition")],copy = T) %>% magrittr::extract(!as.vector(NAs),) %>%mutate(method ="non_impute") %>% dplyr::select(method,everything())
More Trees in the Random forrest hardly make a difference (RF vs RF2). KNN is imputing too right biased, as expected, since it is more suitable for the MAR values. For Me, GSimp seems to work best, especially since here it is quite clear, that values are MNAR, so a far left shifted imputation is desired.
6.4 Evaluate effect of Imputation on PCA for one example
8 Testing significant differences between conditions
BH correction of pval for multiple testing
Code
data_imp %>%test_diff(type ="all", control ="Embryo", fdr.type ="BH") %>%add_rejections(alpha =0.05, lfc =1) -> data_test# alpha is the significance level, lfc is the log fold change threshold
features <-c(50,150,500,1500)plot_list <-list()for (feature in features) { p <- p<-plot_pca(data_test, label = F, n = feature, features ="Proteins", indicate =c("label", "condition"))+scale_colour_manual(values = sample_color_vec)+theme_cowplot(font_size =6)+ggtitle(paste("PCA of top", feature, "proteins"))+theme(aspect.ratio =1) p[feature] <- p}
Warning in p[feature] <- p: number of items to replace is not a multiple of
replacement length
Warning in p[feature] <- p: number of items to replace is not a multiple of
replacement length
Warning in p[feature] <- p: number of items to replace is not a multiple of
replacement length
Warning in p[feature] <- p: number of items to replace is not a multiple of
replacement length
Code
if (save){library(gridExtra) # for grid.arrange# Arrange 4 plots per page (adjust per your preference) n_per_page <-4 n_pages <-ceiling(length(plot_list) / n_per_page)pdf(file =file.path(fig_dir, paste0(date, "_PCA_1500_tiled.pdf")), width =12, height =8)for (i inseq_len(n_pages)) { idx_start <- (i -1) * n_per_page +1 idx_end <-min(i * n_per_page, length(plot_list)) plots_to_print <- plot_list[idx_start:idx_end]grid.arrange(grobs = plots_to_print, ncol =2) }dev.off()}
Attaching package: 'gridExtra'
The following object is masked from 'package:randomForest':
combine
The following object is masked from 'package:MSnbase':
combine
The following object is masked from 'package:Biobase':
combine
The following object is masked from 'package:BiocGenerics':
combine
The following object is masked from 'package:dplyr':
combine